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Introduction to Adaptive Filters 


In many applications requiring filtering, the necessary frequency response 
may not be known beforehand, or it may vary with time. (Example; 
suppression of engine harmonics in a car stereo.) In such applications, an 
adaptive filter which can automatically design itself and which can track 
system variations in time is extremely useful. Adaptive filters are used 
extensively in a wide variety of applications, particularly in 
telecommunications. 

Outline of adaptive filter material 


1. Wiener Filters LZ optimal (FIR) filter design in a statistical context 

2. LMS algorithm simplest and by-far-the-most-commonly-used 
adaptive filter algorithm 

3. Stability and performance of the LMS algorithm When and how 
well it works 

4. Applications of adaptive filters Overview of important applications 

5. Introduction to advanced adaptive filter algorithms Techniques for 
special situations or faster convergence 


Discrete-Time, Causal Wiener Filter 


Stochastic D2 optimal (least squares) FIR filter design problem: Given a wide-sense stationary (WSS) input signal 
x, and desired signal dy (WSS @ Ely] = Elyx+al, ryz(l) = Elyeznsil, Vk, 1: (ryy(0) < 00)) 


The Wiener filter is the linear, time-invariant filter minimizing E [e?] , the variance of the error. 


As posed, this problem seems slightly silly, since d; is already available! However, this idea is useful in a wide 
cariety of applications. 


Example: 
active suspension system design 


(en «a— y= level of car body ... qd. = constant 


oy W = suspension system 


av Av Ave a ene Ml aoa x, = road level 


Note:optimal system may change with different road conditions or mass in car, so an adaptive system might be 
desirable. 


Example: 
System identification (radar, non-destructive testing, adaptive control systems) 


Exercise: 


Problem: 


Usually one desires that the input signal x; be "persistently exciting," which, among other things, implies 
non-zero energy in all frequency bands. Why is this desirable? 


Determining the optimal length-N causal FIR Weiner filter 


Note: for convenience, we will analyze only the causal, real-data case; extensions are straightforward. 
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1=0 m=0 
where 


raa(0) = E[{dx”] 
Tax (I) a E\d,X,-1| 
Txx(l— m) = Expr psi—m!| 


This can be written in matrix form as 


Ele”] =raa(0) -2PW" +W7 RW 
where 


Tax(0) 
Tax(1) 
P= 
Tax(M = 1) 
Txx(0)  TPxx(1) Txx(M — 1) 
Teel b) 1 xx(0) 
R= 
: es (5) Fest L) 
Txx(M-1)... 


Txx(1) Txx(0) 
To solve for the optimum filter, compute the gradient with respect to the top weights vector W 


V=-—(2P)+2RW 
(recall ar (ATW) = At, atv (WM W) = 2MW for symmetric M) setting the gradient equal to zero > 
WotR = P > Wop = RP 


Since R is a correlation matrix, it must be non-negative definite, so this is a minimizer. For R positive definite, the 
minimizer is unique. 


Practical Issues in Wiener Filter Implementation 


The weiner-filter, Wopt = RP, is ideal for many applications. But 
several issues must be addressed to use it in practice. 
Exercise: 


Problem: 


In practice one usually won't know exactly the statistics of x; and d; 
(i.e. R and P) needed to compute the Weiner filter. 


How do we surmount this problem? 
Solution: 


Estimate the statistics 


—. eo 
then solve Wo = Ro! = P 
Exercise: 


Problem: 
In many applications, the statistics of x,, dy, vary slowly with time. 


How does one develop an adaptive system which tracks these changes 
over time to keep the system near optimal at all times? 


Solution: 


Use short-time windowed estiamtes of the correlation functions. 


Note: 


EN 1 N-1 
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Exercise: 


Problem: How can r*, (1) be computed efficiently? 
Solution: 


Recursively! 


rk (1) = me) + LELE-1 — Lk-NXLk-N-1 


This is critically stable, so people usually do 


Exercise: 


Problem: how does one choose N? 


Tradeoffs 


Larger NV — more accurate estimates of the correlation values — better 
W opt. However, larger NV leads to slower adaptation. 


Note:The success of adaptive systems depends on z, d being roughly 
Stationary over at least NV samples, N > M. That is, all adaptive filtering 
algorithms require that the underlying system varies slowly with respect to 
the sampling rate and the filter length (although they can tolerate 
occasional step discontinuities in the underlying system). 


Computational Considerations 


As presented here, an adaptive filter requires computing a matrix inverse at 
each sample. Actually, since the matrix R is Toeplitz, the linear system of 
equations can be sovled with O M? computations using Levinson's 


algorithm, where / is the filter length. However, in many applications this 
may be too expensive, especially since computing the filter output itself 
requires O(M/) computations. There are two main approaches to resolving 
the computation problem 


1. Take advantage of the fact that R*+! is only slightly changed from R* 
to reduce the computation to OM); these algorithms are called Fast 
Recursive Least Squareds algorithms; all methods proposed so far 
have stability problems and are dangerous to use. 

2. Find a different approach to solving the optimization problem that 
doesn't require explicit inversion of the correlation matrix. 


Note:Adaptive algorithms involving the correlation matrix are called 
Recursive least Squares (RLS) algorithms. Historically, they were 
developed after the LMS algorithm, which is the slimplest and most widely 
used approach O(M).O M? RLS algorithms are used in applications 


requiring very fast adaptation. 


Quadratic Minimization and Gradient Descent 


Quadratic minimization problems 


The least squares optimal filter design problem is quadratic in the filter 
coefficients: 


Ele | T £ cR 


If R is positive definite, the error surface & le | w w wu isa 
N 


unimodal "bowl" in 


The problem is to find the bottom of the bowl. In an adaptive filter context, 
the shape and bottom of the bowl may drift slowly with time; hopefully 
slow enough that the adaptive algorithm can track it. 


For a quadratic error surface, the bottom of the bowl can be found in one 
step by computing R . Most modern nonlinear optimization methods 
(which are used, for example, to solve the i optimal IIR filter design 
problem!) locally approximate a nonlinear function with a second-order 
(quadratic) Taylor series approximation and step to the bottom of this 
quadratic approximation on each iteration. However, an older and simpler 
appraoch to nonlinear optimaztion exists, based on gradient descent. 
Contour plot of ¢-squared 


The idea is to iteratively find the minimizer by computing the gradient of 
the error function: & as The gradient is a vectorin ™ pointing 
in the steepest uphill direction on the error surface at a given point  *, 
with having a magnitude proportional to the slope of the error surface in 


this steepest direction. 


By updating the coefficient vector by taking a step opposite the gradient 
direction: ‘ ‘pw *, we go (locally) "downhill" in the steepest 
direction, which seems to be a sensible way to iteratively solve a nonlinear 
optimization problem. The performance obviously depends on py; if pz is too 
large, the iterations could bounce back and forth up out of the bowl. 
However, if yz is too small, it could take many iterations to approach the 
bottom. We will determine criteria for choosing y later. 


In summary, the gradient descent algorithm for solving the Weiner filter 
problem is: 


The gradient descent idea is used in the LMS adaptive fitler algorithm. As 
presented, this alogrithm costs O (M ) computations per iteration and 
doesn't appear very attractive, but LMS only requires O M computations 
and is stable, so it is very attractive when computation is an issue, even 
thought it converges more slowly then the RLS algorithms we have 
discussed so far. 


The LMS Adaptive Filter Algorithm 


Recall the Weiner filter problem 


{xx}, {d,} jointly wide sense stationary 


Find W minimizing & lex? | 


The superscript denotes absolute time, and the subscript denotes time or a 
vector index. 


the solution can be found by setting the gradient 0 
Equation: 


ze _ 9Elex’| 
Vn ow 


= E[2ex (—X"*)| 
= B|-2 (a, = x" wW,)X*| 


= (28 [d.X*)) + E[x*")w 
= 2P+2RW 


=> (Woot a eee) 


Alternatively, Wp, can be found iteratively using a gradient descent 
technique 


w+! — wk— pve 


In practice, we don't know R and P exactly, and in an adaptive context they 
may be slowly varying with time. 


To find the (approximate) Wiener filter, some approximations are necessary. 
As always, the key is to make the right approximations! 


Note:Approximate R and P: > RLS methods, as discussed last time. 


Note: Approximate the gradient! 


- 0 Elex”| 


Vy 
OW 


Note that €;, itself is a very noisy approximation to [eEx?| . We can get a 
noisy approximation to the gradient by finding the gradient of €,2! Widrow 
and Hoff first published the LMS algorithm, based on this clever idea, in 
1960. 


=e} (—x*) =— (2e,.X*) 


This yields the LMS adaptive filter algorithm 


Example: 
The LMS Adaptive Filter Algorithm 


i We X* — Se WEE K-i 

2-6 =p = Yk 

3. WE = Wh pk = We (—2e,X*) = W* + 2ye,X* ( 
wert = wk + 2uene ri) 


The LMS algorithm is often called a stochastic gradient algorithm, since 


Vi isa noisy gradient. This is by far the most commonly used adaptive 
filtering algorithm, because 


. it was the first 

. it is very simple 

. in practice it works well (except that sometimes it converges slowly) 

. it requires relatively litle computation 

. it updates the tap weights every sample, so it continually adapts the 
filter 

. it tracks slow changes in the signal statistics well 


um BWN Re 
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Computational Cost of LMS 


To Compute > Yk Ek wet = Total 
multiplies M 0 M+1 2M+1 
adds M-1 1 M 2M 


So the LMS algorithm is O(M) per sample. In fact, it is nicely balanced in 
that the filter computation and the adaptation require the same amount of 
computation. 


Note that the parameter ps plays a very important role in the LMS algorithm. 
It can also be varied with time, but usually a constant pz ("Convergence 
weight facor") is used, chosen after experimentation for a given application. 


Tradeoffs 
large ps: fast convergence, fast adaptivity 


small jw: accurate W — less misadjustment error, stability 


First Order Convergence Analysis of the LMS Algorithm 


Analysis of the LMS algorithm 


It is important to analyze the LMS algorithm to determine under what conditions it 
is stable, whether or not it converges to the Wiener solution, to determine how 
quickly it converges, how much degredation is suffered due to the noisy gradient, 
etc. In particular, we need to know how to choose the parameter pu. 


Mean of W 


does W *, k —> oo approach the Wiener solution? (since W* is always somewhat 
random in the approximate gradient-based LMS algorithm, we ask whether the 
expected value of the filter coefficients converge to the Wiener solution) 
Equation: 


E[we}) = wei 
= E[W* + 2ue,X*| 
= Wk+ 2uE|d,X*| + 25 |— ((WiTx*) x*) 


= W+2uP+- (2uz|(W*"x*)x*|) 


Patently False Assumption 


X* and X*-*, X* and d*-*, and d; and d;_,; are statistically independent, 7 4 0. 
This assumption is obviously false, since X *~! is the same as X * except for 
shifting down the vector elements one place and adding one new sample. We 
make this assumption because otherwise it becomes extremely difficult to analyze 
the LMS algorithm. (First good analysis not making this assumption: Macchi and 
Eweda) Many simulations and much practical experience has shown that the 
results one obtains with analyses based on the patently false assumption above are 
quite accurate in most situations 


With the independence assumption, W * (which depends only on previous X*~*, 
d*~*) is statitically independent of X*, and we can simplify E (w* "xh ) xX | 


Now (wx 5) X* is a vector, and 


Equation: 


T : = 
B|(WEX*)x*] = BOM whey sta; 


= Liey' Ewha itn] 


= Diso' (wf) Flee 


Sa WET x(t = j) 


RW* 


Tl . : 
where R = E [x kXk | is the data correlation matrix. 


Putting this back into our equation 
Equation: 


Wel = Wh+2uP+—- (2uzw*) 


IW* + 2uP 


Now if W*~> converges to a vector of finite magnitude ("Convergence in the 
mean"), what does it converge to? 


If W* converges, then as k + o0, W*+! ~ W*, and 
We SS IW Pp 2eP 
2uRW© = 2uP 


RW =P 
Or 
Woo = RP 


pt — 
the Wiener solution! 


So the LMS algorithm, if it converges, gives filter coefficients which on average 
are the Wiener coefficients! This is, of course, a desirable result. 


First-order stability 
But does W* converge, or under what conditions? 


Let's rewrite the analysis in term of V*, the "mean coefficient error vector" 


VE = W* — Wopt, where Wopt is the Wiener filter 
W*tl= W*—2uRWE + 2uP 


Wk+1 — Wopt =WwFr- Wopt Deans (2uzw*) oe 2URWopt a 2uRW opt + 2uP 


Vel = VE — Qu RV* + — (Qu RWopt) + 2uP 


Now Wot = R71, so 
Vet = VE — QuRV* + — (2uRR'P) + 2uP = (I — 2uR)V* 


We wish to know under what conditions V*~~ — 0? 


Linear Algebra Fact 


Since R is positive definite, real, and symmetric, all the eigenvalues are real and 
positive. Also, we can write R as Q~ ‘AQ, where A is a diagonal matrix with 
diagonal entries A; equal to the eigenvalues of R, and Q is a unitary matrix with 
rows equal to the eigenvectors corresponding to the eigenvalues of R. 


Using this fact, 
yr = (I - Qu (Q-'AQ))V* 


multiplying both sides through on the left by @: we get 


QV = (Q — 2uAQ)V* = (1 — 2nA)QV® 
LetV’ = QV: 
Vr Sr 2A © 


Note that V is simply V in a rotated coordinate set in R™, so convergence of V 
implies convergence of V. 


Since 1 — 2A is diagonal, all elements of V evolve independently of each other. 
Convergence (stability) bolis down to whether all M of these scalar, first-order 
difference equations are stable, and thus —> (0). 

Vi,i = [1,2,...,M]: (V,"*" = (1 — 2pa,)V, *) 
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These equations converge to zero if |1 — 2uA;| < 1, or Vi : (|wA;| < 1) wand A; 
are positive, So we require V2 : (u < 5 so for convergence in the mean of the 


LMS adaptive filter, we require 
Equation: 


This is an elegant theoretical result, but in practice, we may not know Amax, it may 
be time-varying, and we certainly won't want to compute it. However, another 
useful mathematical fact comes to the rescue... 


Since the eigenvalues are all positive and real. 


For a correlation matrix, Vi,z € {1, M}: (ri = r(0)). So 
tr (R) = Mr(0) = ME|xz2x;]. We can easily estimate r(0) with O(1) 
computations/sample, so in practice we might require 


u< 
Mr(0) 


as a conservative bound, and perhaps adapt yz accordingly with time. 


Rate of convergence 


Each of the modes decays as 


ee 


Note: The initial rate of convergence is dominated by the fastest mode 
1 — 2uAmax. This is not surprising, since a dradient descent method goes 
"downhill" in the steepest direction 


Note: The final rate of convergence is dominated by the slowest mode 
1 — 2uAmin. For small Amin, it can take a long time for LMS to converge. 


Note that the convergence behavior depends on the data (via R). LMS converges 
relatively quickly for roughly equal eigenvalues. Unequal eigenvalues slow LMS 
down a lot. 


Second-order Convergence Analysis of the LMS Algorithm and Misadjustment Error 


Convergence of the mean (first-order analysis) is insufficient to guarantee desirable behavior of the LMS 
algorithm; the variance could still be infinite. It is important to show that the variance of the filter coefficients is 
finite, and to determine how close the average squared error is to the minimum possible error using an exact 
Wiener filter. 

Equation: 


EleZ| = E (as —w*?x*) / 
= Blak” — 2d, X* We — Wi XEXE WE) 


= raa(0) —2W*"P+Ww*’ RW* 


The minimum error is obtained using the Wiener filter 
Wopt = R'P 
Equation: 
Emne = E [e?| 
(raa(0) —2PTR1P+ Piha) 
raa(0) = PTR Pp 
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To analyze the average error in LMS, write [link] in terms of V’ = Q(W — Wopt], where Q7AQ=R 

Equation: 

Elex?] = raa(0) — 2W*P + WY RW + (— (WT RWope)) — Wop RW + Wop RWope + W* R 
= raa(0)+V" RV' — PTR1P 
= min? +V RV* 
= €min? + V*'Q7QRQ1QV# 

1? Ay'k 


— Bane +V 
N-1 P 
Ble,” | = ean + S AjE lu 
j=0 
So we need to know E bal , which are the diagonal elements of the covariance matrix of V'*, or E vey ; 


From the LMS update equation 
Wt — Wk 4+ Que, X* 
we get 
vet — w* + Que,QXx* 


Equation: 


yl _ BF lv h+ly ial 


B[4uex?Qx*X"Q?| 


= win (c.qx'v"") as 2y (enV *x*"Q7) 4 4p? E]enQX*X"Q?| 


Note that 
_ kT yk T KT k 
Ek =dyp—W" X* =dy-—Wor —V~ QX 


so 
Equation: 


Ble.Qx'v"| es B|dQxsv*" — Won XtQX*V ET — ve Quy 
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0+0— (Qx'x*Qrv*v*") 
~ -(a8bo-x"arsfrer") 
— (A¥*) 


Note that the Patently False independence Assumption was invoked here. 


To analyze E le g OX" Xx mo) , we make yet another obviously false assumptioon that €;,” and X* are 


statistically independent. This is obviously false, since ¢, = dz — WX, Otherwise, we get 4th-order terms in 
X in the product. These can be dealt with, at the expense of a more complicated analysis, if a particular type of 
distribution (such as Gaussian) is assumed. See, for example Gardner, A questionable justification for this 
assumption is that as W* ~ Wopts W* becomes uncorrelated with X* (if we invoke the original independence 
assumption), which tends to randomize the error signal relative to X*. With this assumption, 


BleZQx'x*'Q?| = Blex?] B|QX*X*"Q"] = Ble,2] A 


nel ) 
é° =tan’ +V" AV ™ 


so 
Equation: 


Ele?) = nin? + E[DjAV;*"| 
= Cnn + op AF 


Thus, [link] becomes 
Equation: 


UE = IV + AP SARA Ft Are min? A 
j 


Now if this system is stable and converges, it converges to V° = ¥*t 


=> (sua adie (= Agha =) 4) 
j 


=> (v= =p (= Aj Vij + ona!) 
J 


So it is a diagonal matrix with all elements on the diagonal equal: 


Then 


Ver = ps (x > Aj+ su"| 
J 
Vier (: a us> ») = pienin® 
J 


2 
eo) HE min 


Ke = Ene 
1— pi jAj 


Thus the error in the LMS adaptive filter after convergence is 
Equation: 


Eléw?] = €min? + E/V'~AV | 
2 HE min” do 5 Aj 


Emin pyr 


E + 2001 
mM apg 
e,, 2-1 _ 
min {—,tr(R) 
e211 
min 1—prs(0)N 
Equation: 
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Ele00"] = Evia” 1 —Ne 


1 — pNo,’ is called the misadjustment factor. Oftern, one chooses sz to select a desired misadjustment factor, 
such as an error 10% higher than the Wiener filter error. 


2nd-Order Convergence (Stability) 


To determine the range for yz for which [link] converges, we must determine the yz for which the matrix difference 
equation converges. 


YESS TIN Awe SHEA + A Emin” A 


Jj 


The off-diagonal elements each evolve independently according to a = 4uri%,s These terms will decay 
to zero if Vi : (4A; < 2), orp < zy — 


The diagonal terms evolve according to 


VEY = IF + Awd: S* AGG + Aw eins 
j 


For the homoegeneous equation 


ye = 1, sh 4w ri So AV 
j 


for 1 — 4p; positive, 
Equation: 


WY SWI + MOO YOGI a = ( — Audi +44? S) ») Vics 


Jj Jj 


¥,**1 will be strictly less than VY". for 


jjmax 


1 4pr; + 4A $0 AG <1 
j 


or 
Ap; S> rj < Apr; 
J 
or 
Equation: 
1 = 1 
BS Sy 
= 1 
~  Nrxx(0) 
— AT 
“Noe, ? 


This is a more rigorous bound than the first-order bounds. Ofter engineers choose jz a few times smaller than this, 


since more rigorous analyses yield a slightly smaller bound. uw = ae is derived in some analyses assuming 


Gaussian xp, dz. 


Applications of Adaptive Filters 


Adaptive filters can be configured in a surprisingly large variety of ways for 
application in a large number of applications. The same LMS algorithm can 
be used to adapt the filter coefficients in most of these configurations; only 
the "plumbing" changes. 


Adaptive System Identification 


Note: To approximate an unknown system (or the behavior of that system) as closely as possible 


The optimal solution is R-'P = W 


Suppose the unknown system is a causal, linear time-invariant filter: 


[e.@) 
dy = 22" he = ; Treat, 
iO 


Now 
Equation: 
P = (Eldgax_y)) 
= CE) Py wechieees)) 
= ( oh Elen ize-]) 
— ( ae tal(j = t)) 
ree(0)— r(1) r(M—1) | r(M) r(M+1) 
r(1) r(0) ne 
= (2) r(1) ee 
see (0): (1) r(2) r(3) : 
r(M—1) r(M—2) ... r(1) r(0) el)“ (2) 
If the adaptive filter H is a length-M FIR filter (h(m) = h(m +1) =... = 0), this reduces to 
P= Rh" 


and 


Wot = R'P=R1(R )= 


FIR adaptive system identification thus converges in the mean to the corresponding M/Z samples 
of the impulse response of the unknown system. 


Adaptive Equalization 


Note: Design an approximate inverse filter to cancel out as much distortion 
as possible. 


In principle, Or —., so that the overall response of the 


top path is approximately . However, limitations on the form of 
(FIR) and the presence of noise cause the equalization to be imperfect. 


Important Application 


Channel equalization in a digital communication system. 


Decision 


A 
Ss, (Chanse) Matched Filter S. 


If the channel distorts the pulse shape, the matched filter will no longer be 
matched, intersymbol interference may increase, and the system 


performance will degrade. 


An adaptive filter is often inserted in front of the matched filter to 
compensate for the channel. 


This is, of course, unrealizable, since we do not have access to the original 
transmitted signal, 


There are two common solutions to this problem: 


1. Periodically broadcast a known training signal. The adaptation is 
switched on only when the training signal is being broadcast and thus 


is known. 
2. Decision-directed feedback: If the overall system is working well, then 
the output should almost always equal . We can thus use 


our received digital communication signal as the desired signal, since 
it has been cleaned of noise (we hope) by the nonlinear threshold 
device! 

Decision-directed equalizer 


Delay of A 


As long as the errorrate in is not too high (say ), this method 
works. Otherwise, is so inaccurate that the adaptive filter can never 
find the Wiener solution. This method is widely used in the telephone 
system and other digital communication networks. 


Adaptive Interference (Noise) Cancellation 


Note: Automatically eliminate unwanted interference in a signal. 


j> 


I La 


The object is to subtract out as much of the noise as possible. 


Example: 
Engine noise cancellation in automobiles 


The firewall attenuates and filters the noise reaching the listener's ear, so it is not 
thesameas__. There is also a delay due to acoustic propagation in the air. For 


maximal cancellation, an adaptive filter is thus needed to make __as similar as 
possible to the delayed 


Inject cancellation signal through 
the stereo speaker! 


Exercise: 


Problem: 


What conditions must we impose upon the microphone locations for this to 
work? (Think causality and physics!) 


Analysis of the interference cancellor 


$+, 
A 
1 ¥. S, 
Nn, 
wr aa A 
e ~ 
Weassume , ,and _ are zero-mean signals, andthat is independent of 


and . Then 


Since the input signal has no information about in it, minimizing can 


only affect the second term, which is the standard Wiener filtering problem, with 
solution 


Adaptive Echo Cancellation 


An adaptive echo canceller is a specific type of adaptive interference 
canceller that removes echos. Adaptive echo cancellers are found in all 
modern telephone systems. 


Long distance 
(digital) 


Two wire to home 


The hybrid is supposed to split the opposite-going waves, but typically 
achieves only about 15dB of suppression. This signal will eventually reach 
the other end and be coupled back, with a long delay, to the original source, 
which gives a very annoying echo. 
Echo canceller 

—= 


Hybnd 
leakage 
transfer 


function 


Because the input to the adaptive echo canceller contains only the signal 
from the far end that will echo off of the hybrid, it cancels the echo while 
passing the near-end signal as desired. 


Narrowband interference canceller 


A sinusoid is predictable samples ahead, whereas may not be, so the 
sinusoid can be cancelled using the adaptive system in the Figure. This is 
another special case of the adaptive interference canceller in which the 
noise reference input is a delayed version of the primary (signal plus noise) 
input. Note that must be large enough so that and are 
uncorrelated, or some of the signal will be cancelled as well! 

Exercise: 


Problem: 
How would you construct an "adaptive line enhancer" that preserves 
the sinusoids but cancels the uncorrelated noise? 
Other Applications 
e Adaptive array processing 


e Adaptive control 
° etc... 


Beyond LMS: an overview of other adaptive filter algorithms 


RLS algorithms 


FIR adaptive filter algorithms with faster convergence. Since the Wiener 
solution can be obtained on one step by computing W RP, most 
RLS algorithms attept to estimate R and PandcomputeW from 
these. 


There are anumber of O N algorithms which are stable and converge 
quickly. A number of O N algorithms have been proposed, but these are 
all unstable except for the lattice filter method. This is described to some 
extent in the text. The adaptive lattice filter converges quickly and is stable, 
but reportedly has a very high noise floor. 


Many of these approaches can be thought of as attempting to 
"orthogonalize" R, or to rotate the data or filter coefficients to a domain 


where £ is diagonal, then doing LMS in each dimension separately, so that 
a fast-converging step size can be chosen in all directions. 


Frequency-domain methods 


Frequency-domain methods implicitly attempt to do this: 


If QRQ _ isa diagonal matrix, this yields a fast algorithm. If Q is chosen 
as an FFT matrix, each channel becomes a different frequency bin. Since R 
is Toeplitz and not a circulant, the FFT matrix will not exactly diagonalize 
R, but in many cases it comes very close and frequency domain methods 


converge very quickly. However, for some R they perform no better than 
LMS. By using an FFT, the transformation Q becomes inexpensive 

ON __ N .If one only updates on a block-by-block basis (once per V 
samples), the frequency domain methods only cost O N computations 
per sample. which can be important for some applications with large N. 
(Say 16,000,000) 


Adaptive IIR filters 


Adaptive IIR filters are attractive for the same reasons that IIR filters are 
attractive: many fewer coefficients may be needed to achieve the desired 
performance in some applications. However, it is more difficult to develop 
stable IIR algorithms, they can converge very slowly, and they are 
susceptible to local minima. Nonetheless, adaptive IIR algorithms are used in 
some applications (such as low frequency noise cancellation) in which the 
need for I[R-type responses is great. In some cases, the exact algorithm used 
by a company is a tightly guarded trade secret. 


Most adaptive IIR algorithms minimize the prediction error, to linearize the 
estimation problem, as in deterministic or block linear prediction. 
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Thus the coefficient vector is 
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and the "signal" vector is 
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An LMS algorithm can be derived using the approximation le Z| =6;° UF 
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and finally 
Win =W,-U Vi 


where the may be different for the different IIR coefficients. Stability and 
convergence rate depends on these choices, of course. There are a number of 
variations on this algorithm. 


Due to the slow convergence and the difficulties in tweaking the algorithm 
parameters to ensure stability, ITR algorithms are used only if there is an 
overriding need for an IIR-type filter. 


The Constant-Modulus Algorithm and the Property-Restoral Principle 


The adaptive filter configurations that we have examined so far require a 
"desired signal" dy. There are many clever ways to obtain such a signal, but 
in some potential applications a desired signal is simply not available. 
However, a "property-restoral algorithm" can sometimes circumvent this 
problem. 


If the uncorrupted signal has special properties that are characteristic of the 
signal and not of the distortion or interference, an algorithm can be 
constructed which attempts to cause the output of the adaptive filter to 
exhibit that property. Hopefully, the adapting filter will restore that property 
by removing the distortion or interference! 


Example: 

the Constant-Modulus Algorithm (CMA) 

Certain communication modulation schemes, such as PSK and FSK, 
transmit a sinusoid of a constant analytic magnitude. Only the frequency or 
phase change with time. The constant modulus algorithm tries to drive the 
output signal to one having a constant amplitude: 


ek = (\ygl)? — A? 


One can derive an LMS (or other) algorithm that seeks a Wiener filter 
minimizing this error. In practice, this works very well for equalization of 
PSK and FSK communication channels. 


CMA is simpler than decision-directed feedback, and can work for high 
initial error rates! 


This property-restoral idea can be used in any context in which a property- 
related error can be defined. 


Complex LMS 


LMS for complex data and coefficients (such as quadrature communication 
systems) takes the form 


yr Wi X- 
er dk Yk 
Wi Wi pe; Xk 


It is derived in exactly the same way as LMS, using the following complex 
vector differentiation formulas 


spit 
«wi 
a= 2 R 


or by differentiating with respect to the real and imaginary parts separately 
and recombining the results. 


Normalized LMS 
In "normalized" LMS, the gradient step factor jz is normalized by the 
energy of the data vector: 


a 


LNLMS = REX es 


where @ is usually + and o is a very small number introduced to prevent 
division by zero if X re X;, is very small. 


Wrsi = Wi = enXk 


XH 
The normalization has several interpretations 


1. corresponds to the 2nd-order convergence bound 

2. makes the algorithm independent of signal scalings 

3. adjusts W;,,1 to give zero error with current input: W,41X;, = dz 
4, minimizes mean effort at time k + 1 


NLMS usually converges much more quickly than LMS at very little extra 
cost; NLMS is very commonly used. In some applications, normalization is 
SO universal that "we use the LMS algorithm" implies normalization as 
well. 


Summary of Adaptive Filtering Methods 


1 


. LMS remains the simplest and best algorithm when slow convergence 


is not a serious issue (typically used) O(V) 


. NLMS simple extension of the LMS with much faster convergence in 


many cases (very commonly used) O(V) 


. Frequency-domain methods offer computational savings (O(log N)) 


for long filters and usually offer faster convergence, too (sometimes 
used; very commonly used when there are already FFTs in the system) 


. Lattice methods are stable and converge quickly, but cost 


substantially more than LMS and have higher residual EMSE than 
many methods (very occasionally used) O(V) 


. RLS algorithms that converge quickly and are stable exist. However, 


they are considerably more expensive than LMS. (almost never used) 


O(N) 


. Block RLS (least squares) methods exist and can be pretty efficient in 


some cases. (occasionally used) O(log N), O(N), O N? 


. IIR methods are difficult to implement successfully and pose certain 


difficulties, but are sometimes used in some applications, for example 
noise cancellation of low frequency noise (very occasionally used) 


. CMA very useful when applicable (blind equalization); CMA is the 


method for blind equalizer initialization (commonly used in a few 
specific equalization applications) O(V) 


Note:In general, getting adaptive filters to work well in an application is 
much more challenging than, say, FFTs or IIR filters; they generally 
require lots of tweaking! 


